Skip to content

Conversation

@rbpittman
Copy link
Contributor

@rbpittman rbpittman commented Oct 27, 2025

Description of PR

Summary:
Fixes issue with order of operations when syncd container is dynamically swapped during test_tunnel_qos_remap.py with respect to the VOQ Watchdog disabler.
The VOQ watchdog fixture runs first because it has no dependency on the syncd swap. Thus the WD is temporarily disabled, but then the syncd-swap (config reload) reload re-enables it.
Fix by requiring the syncd_swap to be performed before the disable_voq_watchdog_dualtor fixture runs.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

  • Validated on master branch qos/test_tunnel_qos_remap.py on hash 20aae03 without extra parameters (for example, without "--qos_swap_syncd=False" or disabling loganalyzer)
  • Validated on 202411 active watermark test case passes
=============================================== 10 passed, 7 skipped, 851 warnings in 8220.27s (2:17:00) =============
sonic-mgmt$ git log -n 1                                                                                
commit 20aae0310ff04063c2a65d0045eac41a41728be5 (HEAD -> voq_wd_syncd_swap_issue_master, origin/voq_wd_syncd_swap_issue_master)          
Author: Randall Pittman <[email protected]>                                                                                             
Date:   Wed Nov 5 23:25:14 2025 +0000                                                                                                    
                                                                                                                                         
    Require swap_syncd before setup_module.                                                                                              

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lolyu lolyu requested a review from XuChen-MSFT October 28, 2025 10:51
Copy link
Contributor

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lolyu
Copy link
Contributor

lolyu commented Oct 28, 2025

Hi @StormLiangMS, please help review/signoff this one

Copy link
Contributor

@yyynini yyynini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this causes the test_pfc_watermark_extra_lossless_active test to fail.
For inner_dscp, outer_dscp, prio, queue = (3, 2, 3, 3): E Failed to detect congestion due to PFC pause, failed check 19200 > 29200 E For inner_dscp, outer_dscp, prio, queue = (4, 6, 4, 4): E Failed to detect congestion due to PFC pause, failed check 19200 > 29200
Could you help me understand why this is happening?
test_pfc_watermark_extra_lossless_active.log

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rbpittman
Copy link
Contributor Author

@yyynini added a dependency for setup_module for swapsyncd to make sure counterpolling is disabled after the swap.

@lolyu
Copy link
Contributor

lolyu commented Nov 6, 2025

Hi @yyynini, please verify the latest change, thanks!

@yyynini
Copy link
Contributor

yyynini commented Nov 6, 2025

@rbpittman I'm still seeing the same failure in test_pfc_watermark_extra_lossless_active.
E AssertionError: Watermark failures were found: E For inner_dscp, outer_dscp, prio, queue = (3, 2, 3, 3): E Failed to detect congestion due to PFC pause, failed check 19200 > 29200 E For inner_dscp, outer_dscp, prio, queue = (4, 6, 4, 4): E Failed to detect congestion due to PFC pause, failed check 19200 > 29200
Pls let me know if there's anything you'd like me to try or collect.

@rbpittman
Copy link
Contributor Author

@yyynini Updated description, I got a clean pass for the entire file running off hash 20aae03. Also got a pass for the watermark test case on a 202411 double commit preview, both with the double-swap-syncd and single-swap-syncd fixture change.
Any chance the fix isn't making it into the test run?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@yyynini yyynini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire test_tunnel_qos_remap run passed successfully.
lgtm

Copy link
Contributor

@XuChen-MSFT XuChen-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StormLiangMS StormLiangMS merged commit 1c690b7 into sonic-net:master Nov 11, 2025
16 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Nov 11, 2025
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
 Validated on master branch qos/test_tunnel_qos_remap.py on hash 20aae03 without extra parameters (for example, without "--qos_swap_syncd=False" or disabling loganalyzer)
 Validated on 202411 active watermark test case passes
=============================================== 10 passed, 7 skipped, 851 warnings in 8220.27s (2:17:00) =============
sonic-mgmt$ git log -n 1                                                                                
commit 20aae03 (HEAD -> voq_wd_syncd_swap_issue_master, origin/voq_wd_syncd_swap_issue_master)          
Author: Randall Pittman <[email protected]>                                                                                             
Date:   Wed Nov 5 23:25:14 2025 +0000                                                                                                    
                                                                                                                                         
    Require swap_syncd before setup_module.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #21302

@mssonicbld
Copy link
Collaborator

@rbpittman PR conflicts with 202505 branch

@rbpittman rbpittman deleted the voq_wd_syncd_swap_issue_master branch November 12, 2025 13:44
dcaugher pushed a commit to dcaugher/sonic-mgmt that referenced this pull request Nov 12, 2025
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
 Validated on master branch qos/test_tunnel_qos_remap.py on hash 20aae03 without extra parameters (for example, without "--qos_swap_syncd=False" or disabling loganalyzer)
 Validated on 202411 active watermark test case passes
=============================================== 10 passed, 7 skipped, 851 warnings in 8220.27s (2:17:00) =============
sonic-mgmt$ git log -n 1                                                                                
commit 20aae03 (HEAD -> voq_wd_syncd_swap_issue_master, origin/voq_wd_syncd_swap_issue_master)          
Author: Randall Pittman <[email protected]>                                                                                             
Date:   Wed Nov 5 23:25:14 2025 +0000                                                                                                    
                                                                                                                                         
    Require swap_syncd before setup_module.
dcaugher pushed a commit to dcaugher/sonic-mgmt that referenced this pull request Nov 12, 2025
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
 Validated on master branch qos/test_tunnel_qos_remap.py on hash 20aae03 without extra parameters (for example, without "--qos_swap_syncd=False" or disabling loganalyzer)
 Validated on 202411 active watermark test case passes
=============================================== 10 passed, 7 skipped, 851 warnings in 8220.27s (2:17:00) =============
sonic-mgmt$ git log -n 1                                                                                
commit 20aae03 (HEAD -> voq_wd_syncd_swap_issue_master, origin/voq_wd_syncd_swap_issue_master)          
Author: Randall Pittman <[email protected]>                                                                                             
Date:   Wed Nov 5 23:25:14 2025 +0000                                                                                                    
                                                                                                                                         
    Require swap_syncd before setup_module.
mssonicbld pushed a commit that referenced this pull request Nov 17, 2025
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
 Validated on master branch qos/test_tunnel_qos_remap.py on hash 20aae03 without extra parameters (for example, without "--qos_swap_syncd=False" or disabling loganalyzer)
 Validated on 202411 active watermark test case passes
=============================================== 10 passed, 7 skipped, 851 warnings in 8220.27s (2:17:00) =============
sonic-mgmt$ git log -n 1                                                                                
commit 20aae03 (HEAD -> voq_wd_syncd_swap_issue_master, origin/voq_wd_syncd_swap_issue_master)          
Author: Randall Pittman <[email protected]>                                                                                             
Date:   Wed Nov 5 23:25:14 2025 +0000                                                                                                    
                                                                                                                                         
    Require swap_syncd before setup_module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants